Project-Team:CQFD

Project-Team Cqfd

Team, Visitors, External Collaborators

Overall Objectives

Presentation

Research Program

Application Domains

Dependability and safety

New Software and Platforms

New Results

A new characterization of the jump rate for piecewise-deterministic Markov processes with discrete transitions
Estimation of the average number of continuous crossings for non-stationary non-diffusion processes
ClustGeo: an R package for hierarchical clustering with spatial constraints
Change-point detection for Piecewise Deterministic Markov Processes
A sharp first order analysis of Feynman–Kac particle models, Part I: Propagation of chaos
A sharp first order analysis of Feynman–Kac particle models, Part II: Particle Gibbs samplers
Exponential mixing properties for time inhomogeneous diffusion processes with killing
Investigation of asymmetry in E. coli growth rate
Design of estimators for restoration of images degraded by haze using genetic programming
Controlling IL-7 injections in HIV-infected patients
Stochastic Control of Observer Trajectories in Passive Tracking with Acoustic Signal Propagation Optimization
Computable approximations for average Markov decision processes in continuous time
Zero-Sum Discounted Reward Criterion Games for Piecewise Deterministic Markov Processes
Approximation of discounted minimax Markov control problems and zero-sum Markov games using Hausdorff and Wasserstein distances
On the expected total cost with unbounded returns for Markov decision processes
Applying Genetic Improvement to a Genetic Programming library in C++

Bilateral Contracts and Grants with Industry

Bilateral Contracts with Industry

Partnerships and Cooperations

Dissemination

Bibliography

Inria | Raweb 2018 | Presentation of the Project-Team CQFD


	PDF	e-Pub

Previous |

Home | Next next

Section: New Results

On the expected total cost with unbounded returns for Markov decision processes

We consider a discrete-time Markov decision process with Borel state and action spaces. The performance criterion is to maximize a total expected utility determined by unbounded return function. It is shown the existence of optimal strategies under general conditions allowing the reward function to be unbounded both from above and below and the action sets available at each step to the decision maker to be not necessarily compact. To deal with unbounded reward functions, a new characterization for the weak convergence of probability measures is derived. Our results are illustrated by examples.

Authors: François Dufour (Inria CQFD) and Alexandre Genadot (Inria CQFD).

Previous |

Home | Next next